Summary

This summarizes transcript leader/5’UTR lengths in different (annotations of) Saccharomyces species, from yeast genome (YG) and otherwise from Spealman et al 2019 (SM).

Load Packages

Load data

## # A tibble: 27,412 x 22
##    Gene  aATG.context Length d1.context d1.posTSS d1.posATG d1.frame
##    <chr> <chr>         <int> <chr>          <int>     <int>    <int>
##  1 YAL0… gccacaagaaa…      1 GTTCGTGCT…       153       152        2
##  2 YAL0… attacacctag…      1 gaATGGAGC…        11        10        1
##  3 YAL0… TATACACACAT…     51 AGAATTCTC…       200       149        2
##  4 YAL0… ttcaccaccca…      1 ATACCGTCT…        47        46        1
##  5 YAL0… ATAACAGATAA…     63 CTCACTTTG…       124        61        1
##  6 YAL0… TAAAGGAAAAC…     76 CATCTTCCA…       161        85        1
##  7 YAL0… AATAGGTGTAA…     98 TTGGCTTTT…       116        18        0
##  8 YAL0… ataaaggaggt…      1 AGAGCATAG…        23        22        1
##  9 YAL0… AGACCGATCTT…     42 ATGCTACCC…        54        12        0
## 10 YAL0… agacaagtaaG…      3 GTGGTCGTG…       106       103        1
## # … with 27,402 more rows, and 15 more variables: d2.context <chr>,
## #   d2.posTSS <int>, d2.posATG <int>, d2.frame <int>, u1.context <chr>,
## #   u1.posTSS <int>, u1.posATG <int>, u1.frame <int>, u2.context <chr>,
## #   u2.posTSS <int>, u2.posATG <int>, u2.frame <int>, Organism <fct>,
## #   uATGCt <int>, uATGCtmin20 <int>

Only 2-3000 UTRs are annotated in each set

## # A tibble: 5 x 7
##   Organism uAUGtot Lengthtot Nlong Ntmeas  Ntot NuAUG
##   <fct>      <int>     <dbl> <int>  <int> <int> <int>
## 1 cer_YG       936    234564  2835   6571  6571   504
## 2 cer_SM      2904    369593  2943   6446  6460   721
## 3 kud         1052    214669  2293   4780  4780   364
## 4 par         1047    214672  2322   4855  4855   397
## 5 uva          985    202024  2222   4636  4746   370

Plot UTR length distributions for different species

## # A tibble: 5 x 6
##   Organism `25%`  `5%` `50%` `75%` `95%`
##   <fct>    <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 cer_YG      39    19  60      97  227.
## 2 cer_SM      28     9  53     130  551.
## 3 kud         27    10  50     103  325.
## 4 par         28    10  49.5   100  330.
## 5 uva         28    11  49      97  314

Check UTR length distributions for Arribere et al

Transcript Leader length calls in nt from Arribere and Gilbert 2013 (GSM1120728/GSM1120729).

## # A tibble: 2 x 6
##   Rep   `25%`  `5%` `50%` `75%` `95%`
##   <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Rep1     24     8    42    74   203
## 2 Rep2     24     8    43    74   197

Plot uAUG distributions for different species

Proportion of TLs containing uAUGs only includes the TLs with non-zero length, i.e. treating the others as missing data.

Supplementary figure: Saccharomyces sensu stricto species have short AUG-poor transcript leaders

We compared transcript leader length (A), proportion of transcript leaders containing uAUGs (B), and density of uAUGs in transcript leaders (C) between annotations of Saccharomyces yeasts. Annotations are abbreviated as: cer_YG, S. cerevisiae S288C from the saccharomyces genome database (Cherry et al 2013); cer_SM, S. cerevisiae S288C from Spealman et al. (2018); kud, S. kudriavzevii FM1340, par, S. paradoxus CBS432, uva, S. bayanus var. uvarum JRY9191, the latter 3 also from Spealman et al. Note that all annotations show a similar median leader length of 48-60 nucleotides.